Compiling a Massive, Multilingual Dictionary via Probabilistic Inference
نویسندگان
چکیده
Can we automatically compose a large set of Wiktionaries and translation dictionaries to yield a massive, multilingual dictionary whose coverage is substantially greater than that of any of its constituent dictionaries? The composition of multiple translation dictionaries leads to a transitive inference problem: if word A translates to word B which in turn translates to word C, what is the probability that C is a translation of A? The paper introduces a novel algorithm that solves this problem for 10,000,000 words in more than 1,000 languages. The algorithm yields PANDICTIONARY, a novel multilingual dictionary. PANDICTIONARY contains more than four times as many translations than in the largest Wiktionary at precision 0.90 and over 200,000,000 pairwise translations in over 200,000 language pairs at precision 0.8.
منابع مشابه
Panlingual Lexical Translation via Probabilistic Inference
The bare minimum lexical resource required to translate between a pair of languages is a translation dictionary. Unfortunately, dictionaries exist only between a tiny fraction of the 49 million possible language-pairs making machine translation virtually impossible between most of the languages. This paper summarizes the last four years of our research motivated by the vision of panlingual comm...
متن کاملUsing TEX as a Tool in the Production of a Multi-Lingual Dictionary
We propose some simple design rules for encoding multilingual texts for exibility of further automatic processing. Our main recommendation is to include the available descriptive information with the data, and to use symbolic markup conventions. These rules have been applied successfully to the rst steps of compiling a multilingual dictionary.
متن کاملKnowledge Transfer across Multilingual Corpora via Latent Topics
This paper explores bridging the content of two different languages via latent topics. Specifically, we propose a unified probabilistic model to simultaneously model latent topics from bilingual corpora that discuss comparable content and use the topics as features in a cross-lingual, dictionary-less text categorization task. Experimental results on multilingual Wikipedia data show that the pro...
متن کاملGenerative Deep Deconvolutional Learning
A generative model is developed for deep (multi-layered) convolutional dictionary learning. A novel probabilistic pooling operation is integrated into the deep model, yielding efficient bottom-up (pretraining) and top-down (refinement) probabilistic learning. After learning the deep convolutional dictionary, testing is implemented via deconvolutional inference. To speed up this inference, a new...
متن کاملTransculturation and Multilingual Lives: Writing between Languages and Cultures
This paper looks at the issues of transculturation as explored in auto and semi-autobiographical accounts of linguistic and cultural transitions. The paper also addresses a number of questions about the structure of these texts, the authors’ linguistic competences, as well as questions about the theoretical and conceptual tool which may help us to discuss the issues the writers are reflecting o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009